AITopics | dual model

Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

Neural Information Processing SystemsMar-17-2026, 19:56:06 GMT

Pre-trained large language models based on Transformers have demonstrated remarkable in-context learning (ICL) abilities. With just a few demonstration examples, the models can implement new tasks without any parameter updates. However, it is still an open question to understand the mechanism of ICL. In this paper, we attempt to explore the ICL process in Transformers through a lens of representation learning. Initially, leveraging kernel methods, we figure out a dual model for one softmax attention layer.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Code Generation as a Dual Task of Code Summarization

Bolin Wei, Ge Li, Xin Xia, Zhiyi Fu, Zhi Jin

Neural Information Processing SystemsFeb-14-2026, 19:37:58 GMT

Neural Information Processing Systems http://nips.cc/

dataset, regularization term, source code, (16 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
Oceania > Australia (0.04)
North America > Canada (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.65)

Add feedback

e52ad5c9f751f599492b4f087ed7ecfc-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-14-2026, 19:37:43 GMT

Due to limited time, we evaluated SNM [Yin and Neubig, 2017] on Python dataset.5 SNM explicitly introduces the constraints of grammar rules when generating ASTs. The BLEU score for SNM is6 10.62 and similar to our Basic model, indicating that the CG task on this dataset is very challenging. In particular,7 all prediction of SNM is valid, whereas the percentage of valid code generated by the dual model is low (Table 1).8 Since CS and CG models are trained at the same time and the parameters of the36 two models are separate after the joint training, i.e., the two models solve their respective tasks separately after the37 joint training, the number of parameters of each dual model is the same as that of the basic model.

artificial intelligence, machine learning, pleasereadourreplytoreviewer, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.62)

Add feedback

5470abe68052c72afb19be45bb418d02-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 17:26:43 GMT

molecule, retrosynthesis, template, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

Neural Information Processing SystemsOct-9-2025, 17:04:03 GMT

Pre-trained large language models based on Transformers have demonstrated remarkable in-context learning (ICL) abilities. With just a few demonstration examples, the models can implement new tasks without any parameter updates. However, it is still an open question to understand the mechanism of ICL.

attention layer, experiment, gradient descent, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Beijing > Beijing (0.04)
Europe > Italy > Apulia > Bari (0.04)

Genre:

Research Report > Experimental Study (0.92)
Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Code Generation as a Dual Task of Code Summarization

Bolin Wei, Ge Li, Xin Xia, Zhiyi Fu, Zhi Jin

Neural Information Processing SystemsAug-20-2025, 07:16:31 GMT

On the other hand, CG is an indispensable process in which programmers write code to implement specific intents [Balzer, 1985]. Proper comments and correct code can massively improve programmers' productivity and enhance software quality.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.65)

Add feedback

e52ad5c9f751f599492b4f087ed7ecfc-AuthorFeedback.pdf

Neural Information Processing SystemsAug-20-2025, 07:16:18 GMT

dual model, joint training, valid code, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

Neural Information Processing SystemsMay-26-2025, 14:48:12 GMT

Pre-trained large language models based on Transformers have demonstrated remarkable in-context learning (ICL) abilities. With just a few demonstration examples, the models can implement new tasks without any parameter updates. However, it is still an open question to understand the mechanism of ICL. In this paper, we attempt to explore the ICL process in Transformers through a lens of representation learning. Initially, leveraging kernel methods, we figure out a dual model for one softmax attention layer.

artificial intelligence, attention layer, machine learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

FedFixer: Mitigating Heterogeneous Label Noise in Federated Learning

Ji, Xinyuan, Zhu, Zhaowei, Xi, Wei, Gadyatskaya, Olga, Song, Zilong, Cai, Yong, Liu, Yang

arXiv.org Artificial IntelligenceMar-25-2024

Federated Learning (FL) heavily depends on label quality for its performance. However, the label distribution among individual clients is always both noisy and heterogeneous. The high loss incurred by client-specific samples in heterogeneous label noise poses challenges for distinguishing between client-specific and noisy label samples, impacting the effectiveness of existing label noise learning approaches. To tackle this issue, we propose FedFixer, where the personalized model is introduced to cooperate with the global model to effectively select clean client-specific samples. In the dual models, updating the personalized model solely at a local level can lead to overfitting on noisy data due to limited samples, consequently affecting both the local and global models' performance. To mitigate overfitting, we address this concern from two perspectives. Firstly, we employ a confidence regularizer to alleviate the impact of unconfident predictions caused by label noise. Secondly, a distance regularizer is implemented to constrain the disparity between the personalized and global models. We validate the effectiveness of FedFixer through extensive experiments on benchmark datasets. The results demonstrate that FedFixer can perform well in filtering noisy label samples on different clients, especially in highly heterogeneous label noise scenarios.

fedfixer, label noise, noisy label, (15 more...)

arXiv.org Artificial Intelligence

2403.16561

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Federated Semi-Supervised Learning with Annotation Heterogeneity

Shang, Xinyi, Huang, Gang, Lu, Yang, Lou, Jian, Han, Bo, Cheung, Yiu-ming, Wang, Hanzi

arXiv.org Artificial IntelligenceMar-4-2023

Federated Semi-Supervised Learning (FSSL) aims to learn a global model from different clients in an environment with both labeled and unlabeled data. Most of the existing FSSL work generally assumes that both types of data are available on each client. In this paper, we study a more general problem setup of FSSL with annotation heterogeneity, where each client can hold an arbitrary percentage (0%-100%) of labeled data. To this end, we propose a novel FSSL framework called Heterogeneously Annotated Semi-Supervised LEarning (HASSLE). Specifically, it is a dual-model framework with two models trained separately on labeled and unlabeled data such that it can be simply applied to a client with an arbitrary labeling percentage. Furthermore, a mutual learning strategy called Supervised-Unsupervised Mutual Alignment (SUMA) is proposed for the dual models within HASSLE with global residual alignment and model proximity alignment. Subsequently, the dual models can implicitly learn from both types of data across different clients, although each dual model is only trained locally on a single type of data. Experiments verify that the dual models in HASSLE learned by SUMA can mutually learn from each other, thereby effectively utilizing the information of both types of data across different clients.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.02445

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Virginia (0.04)
Europe > Italy (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.92)

Add feedback

Filters

Collaborating Authors

dual model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

Code Generation as a Dual Task of Code Summarization

e52ad5c9f751f599492b4f087ed7ecfc-AuthorFeedback.pdf

5470abe68052c72afb19be45bb418d02-Paper.pdf

Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

Code Generation as a Dual Task of Code Summarization

e52ad5c9f751f599492b4f087ed7ecfc-AuthorFeedback.pdf

Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

FedFixer: Mitigating Heterogeneous Label Noise in Federated Learning

Federated Semi-Supervised Learning with Annotation Heterogeneity